Scalable Parallel Sparse Factorization with Left-right Looking Strategy on Shared Memory Multiprocessors 1 Scalable Parallel Sparse Factorization with Left-right Looking Strategy on Shared Memory Multiprocessors

نویسندگان

  • Olaf Schenk
  • Wolfgang Fichtner
چکیده

An eecient sparse LU factorization algorithm on popular shared memory mul-tiprocessors is presented. Interprocess communication is critically important on these architectures-the algorithm introduces O(n) synchronization events only. No global barrier is used and a completely asynchronous scheduling scheme is one central point of the implementation. The algorithm aims at optimizing the single node performance and minimizing the communication overhead. It has been successfully tested on SUN Enterprise, DEC AlphaServer, SGI Origin 2000, Cray T90, J90, and NEC SX-4 parallel computers, delivering up to 2.3 GFlop/s on an eight processor DEC AlphaServer for medium-size semiconductor device simulations and structural engineering problems. Abstract. An eecient sparse LU factorization algorithm on popular shared memory multiprocessors is presented. Interprocess communication is critically important on these architectures-the algorithm introduces O(n) synchronization events only. No global barrier is used and a completely asynchronous scheduling scheme is one central point of the implementation. The algorithm aims at optimizing the single node performance and minimizing the communication overhead. It has been successfully tested on SUN Enterprise, DEC AlphaServer, SGI Origin 2000, Cray T90, J90, and NEC SX-4 parallel computers, delivering up to 2.3 GFlop/s on an eight processor DEC AlphaServer for medium-size semiconductor device simulations and structural engineering problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Supernodal Cholesky Factorization Algorithm for Shared-Memory Multiprocessors

This paper presents a new left-looking parallel sparse Cholesky fac,torization algorithm for shared-memory MIMD multiprocessors. The algorithm is particularly well-suited for vector supercomputers with multiple processors, such as the Cray Y-MP. The new algorithm uses supernodes in the Cholesky factor to improve performance by reducing indirect addressing and memory traffic. Earlier factorizati...

متن کامل

Zurich ̈ Technische Hochschule

We present PARDISO, a new scalable parallel sparse direct linear solver on shared memory multiprocessors. In this paper, we describe the parallel fac-torization algorithm which utilizes the supernode structure of the matrix to reduce the number of memory references with Level 3 BLAS. We also propose enhancements that signiicantly reduce the communication rate for pipelining parallelism. The res...

متن کامل

Parallel Solution of Sparse Linear Least Squares Problems on Distributed-Memory Multiprocessors

This paper studies the solution of large-scale sparse linear least squares problems on distributed-memory multiprocessors. The method of corrected semi-normal equations is considered. New block-oriented parallel algorithms are developed for solving the related sparse triangular systems. The arithmetic and communication complexities of the new algorithms applied to regular grid problems are anal...

متن کامل

Parallel Solution of Sparse Linear Least Squares Problemson Distributed - Memory

This paper studies the solution of large-scale sparse linear least squares problems on distributed-memory multiprocessors. The method of corrected semi-normal equations is considered. New block-oriented parallel algorithms are developed for solving the related sparse triangular systems. The arithmetic and communication complexities of the new algorithms applied to regular grid problems are anal...

متن کامل

GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis

Lower upper (LU) factorization for sparse matrices is the most important computing step for circuit simulation problems. However, parallelizing LU factorization on the graphic processing units (GPUs) turns out to be a difficult problem due to intrinsic data dependence and irregular memory access, which diminish GPU computing power. In this paper, we propose a new sparse LU solver on GPUs for ci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999